Forecasting Weekly Sales Across Walmart Stores Using Prophet¶

The Problem at Hand¶

The purpose of this project is to forecast future sales predictions for forty five different Walmart stores in the USA and to determine if additional features may prove most useful to support this. Why is forecasting useful? Forecasting can provide insights to aid logistical management of stores for example, stock and staff planning to ensure stock meets demand whilst ensuring the store is neither under or over staffed to maintain cost effectiveness.

The dataset used for this project comes from Kaggle authored by M Yasser H and can be found by following this link: Walmart Dataset. The time span of this data ranges from 5/2/2010 to 1/11/2012 where sales data is represented weekly. Additional variables within this dataset include: Store (Store number), Holiday_Flag (Binary, 1 if holiday occured during a week), Temperature (Fahrenheit), Fuel_Price(US$ per gallon), CPI (Consumer price index)and Unemployment (Unemployment rate).

Exploratory Data Analysis¶

sales_distribution.png

Figure 1: Boxplot of weekly sale ranges for each store.

The initial exploration involved understanding the sales distributions for each store to determine if there are any patterns or unique cases. Figure 1 provides information on the sale ranges for all 45 stores where, a key detail is the presence of outliers for many stores typically representing high sale volumes. Usually the reasons for higher sales stem from a number of factors such as promotions and holiday periods, in this case the data only provided information for holiday events.

Figure 2: Stacked plot of total sales across all stores by year for holiday and non-holiday weeks (Interactable).

Figure 2 provides insight into the total sales for each year and the proportion of which occurred during holiday periods. For each year the percentage of sales which fell into holiday weeks were as follows: [8.7%],[8.4%] and [4.9%]. Although 2012 experienced the least number of total and holiday sales, this is likely explained by data ending on 26/10/2012 where notably the Christmas period is missing for this year which may contribute to the lower values.

Figure 3: Lineplot of weekly sales for upper 33% performance stores (Interactable).

Figure 4: Lineplot of weekly sales for middle 33% performance stores (Interactable).

Figure 5: Lineplot of weekly sales for lower 33% performance stores (Interactable).

Figures 3-5 provide insight into store sale trends over time where stores had been divided into three groups based on a three quartile split. High performing stores consist of the top 33% performing stores based on sales, medium performance the middle 33% and low performance on the lowest 33% of stores. Figure 3 implied top performing stores typically shared similar patterns with major peaks around Christmas. Similarly medium performance stores also follow the same trend however, stores 28 and 41 notably performed better than other stores during non Christmas periods suggesting they may have been better grouped within the high performance stores. Additionally, store 18 saw decreased sales during September 2011 later recovering in October/November implying an event occurred during this period however, the cause is unknown. Finally, low performance stores also typically share the same pattern with some exceptions. For example, store 33 notably does not see as dramatic an increase in sales during the Christmas period with sales remaining roughly stable throughtout the time period comparatively, store 36 implied a decrease in sales overtime.

Feature Importance¶

Following the exploratory analysis, although prophet is a time series based model approach and therefore does not typically use additional features, feature importance was still utilised to determine if any additional features would improve model accuracy.

cor_mat.png

Figure 6: Correlation matrix of numerical features and their relationship with sales.

Figure 6 implied little relationship between the features available with the strongest relationship being between CPI and Unemployment [-0.30] which implied a minor negative relationship. This implied that no features present had a strong relationship with weekly sales suggesting they may be unnecessary for the prophet model.

shap.png

Figure 7: Beeswarm plot of SHAP values for additional features.

Additionally, figure 7 provides further insight into feature impact on weekly sales. In this case feature effects were evaluated using an XGBoost model to invetsigate effects on model accuracy. The SHAP values indicated that unemployment and CPI generally had the greatest impact on weekly sales whilst holiday flags had minimal impact with higher values potentially being related to the Christmas periods specifically. However, it should be noted that the direction of each variable is somewhat ambigous although, unemployment and CPI generally suggested higher values would negatively impact sales. Overall, due to a lack of clear contributions no additional features were implemented into the prophet model to avoid reductions in accuracy.

Methodology¶

In order to prepare the data for modelling the first step involved checking for missing values. This was conducted using a function called check_zeros_nas which totalled NA and 0 value recordings for each column and storing the results. The output of this function implied no missing values or unexpected 0 recordings were present. Following this the "date" variable was formatted to ensure it was in the correct date-time format before being passed to prophet.

As mentioned additional features were tested using an XGBoost model. This was conducted by running the XGBoost regression model using all available features and then applying the results to both permutation importance and shap algorithms.

Before conducting forecasting a cross validation setup (prophet uses rolling-origin evaluation) was applied to the initial model using a setup of an initial 730 days, 90 day period and 90 day horizon. This cross validation was used to evaluate how well the model forecasts future data alongside tuning the changepoint paramater.

Forecasting was then conducted using the same setup as it proved effective, alongside a changepoint value of 0.5 being utilised.

Forecasting Results¶

forecasting_error.png

Figure 8: Barplot of MAPE scores based on cross validation.

kmeans_plot.png

Figure 9: Scatter plot of store forecast performance groupings based on MAPE score.

Before forecasting, cross validation of the prophet model was used to assess performance on unseen data to ensure the model avoided overfitting. Figures 8 and 9 provide the cross validation results where error scores ranged between 0.02 and 0.1 which respectively mean 2% and 10%. Thus implying the forecast error is generally low as where a typical standard is: 0%-10% high accuracy, 11%-20% good accuracy and 21%-30% being acceptable. However, stores within the group 1 cluster may require additional features to improve forecast accuracy due to having the highest amount of error. Forecasting accuracy was based on a 13week (3 months) period into the future, an extened period of forecasting would require more data to maintain accuracy which will be evidenced by the forecast plots. However, overall this implied the forecasts generated by the model are highly accurate evidencing seasonality alone may be enough to explain weekly sales.
grid_1.png

Figure 10: Lineplots for stores 1-9 showing model uncertainty and forecast prediction whilst identifying potential anomalies.

grid_2.png

Figure 11: Lineplots for stores 10-18 showing model uncertainty and forecast prediction whilst identifying potential anomalies.

grid_3.png

Figure 12: Lineplots for stores 19-27 showing model uncertainty and forecast prediction whilst identifying potential anomalies.

grid_4.png

Figure 13: Lineplots for stores 28-36 showing model uncertainty and forecast prediction whilst identifying potential anomalies.

grid_5.png

Figure 14: Lineplots for stores 37-45 showing model uncertainty and forecast prediction whilst identifying potential anomalies.

Figures 10-14 provide the forecast predictions and associated uncertainty for each store where red points indicate potential anomalies. However, skepticism should be employed for anomalous points as they are based on if they fall outside the models expectations. Although, anomalous points that fall far from the expected range are more probable to be correctly identified as many such points occur during the Christmas period where an anticipated dramatic increase in sales is expected. Due to a lack of data other extremes are more difficult to interpret and could correlate to a number of events such as other holiday periods or store closures. Furthermore, forecasting accuracy varried at different points of the horizon in particular during the future Christmas period and the endpoint of the horizon, this likely stems from the models difficulty in mapping Christmas peaks and dwindling data to predict further into the future. This evidences why a period of 13 weeks was chosen as a further forecast would likely bring greater uncertainty reducing the models forecasting accuracy. However, if additional data was available a forecasting for 6 months or 1 year may have been possible without experiencing decreases in forecast accuracy. Additionally, stores with more ambigous trends such as store 43 implied greater difficulty in accurate forecasts suggesting seasonality alone may not be the sole driver for some stores and would require further investigation to identify features which may impact weekly sales.

decomp_s1.png

Figure 15: Time series decomposition for store 1.

decomp_s36.png

Figure 16: Time series decomposition for store 36.

decomp_s43.png

Figure 17: Time series decomposition for store 43.

To further understand forecasting accuracy STL (seasonal trend using LOESS) decompositions were produced. Overall, seasonal trend generally follows the original data from the examples, in this case this is true for stores 1 and 43 but not store 36 where it is instead implied other factors are likely at play for its steady decline. It is additionally suggested model performance is highest during the summer of 2011 due to the residual values being closest to 0 during this period. However residual clustering outside of the summer period, particularly for store 43, implied the presence of external factors aside from seasonality which impacted weekly sales. This in turn evidences the use of anomaly detection and strengthens the argument for further investigation into these periods or into specific stores.

Summary and Future Suggestions¶

Overall, the results of the forecasting gave insight into store trends and patterns where seasonality was a dominant factor in driving weekly sales. However, there is evidence of additional features at play such as holidays in the case of the Christmas period and other unknown features, which warrant further investigation. Additionally, although all forecasts are highly accurate and most falling bellow a 6% error margin group 1 MAPE stores could still benefit from additional features being added to further imporve accuracy particularly for stores with unique trends.

However as evidenced, an investigation could take place to identify other drivers for weekly sales some suggestions include: promotion, internal store/supply issues and store closures to which, this information could improve the models capabilities and provide insight to determining the causes for anomalous points to reduce uncertainty. Additionally, some further non store related features could include: population demography, rural/urban locale (population counts), crime rates and median population income. Where these features could provide insights into weekly sales figures such as for example, differences could be expected in sale volumes between rural and urban locales as a result of population density. Additionally, this applies to median population income where typically areas with higher incomes would likely see more sales due to greater disposable income.

As a final note, an additional use for this data type could be used to predict potential store performance for new stores. For example as used previously, dividing stores into performance groups, and using that as a replacement for weekly sales, could allow a classification algorithm to place new stores into performance categories based on a number of features. Where this method could provide insight when deciding between new store locations.